Data Visualisation using R

Author

Dr Afiq Amsyar

Published

August 13, 2024

Getting started with Data Visualization using R

“The simple graph has brought more information to the data analyst’s mind than any other device.” — John Tukey

Visualisation is a fundamentally human activity. A good visualisation will show you things that you did not expect, or raise new questions about the data. A good visualisation might also hint that you’re asking the wrong question, or you need to collect different data. Visualisations can surprise you, but don’t scale particularly well because they require a human to interpret them.

Graphics packages in R

There are many graphics packages in R. Some packages are aimed to perform general tasks related with graphs. Some provide specific graphics for certain analyses.

The popular general graphics packages in R are:

  1. graphics : a base R package

  2. ggplot2 : a user-contributed package by Hadley Wickham

  3. lattice : a user-contributed package

Note:

  • Do you remember what is a package in R?

  • Where can you learn more about R packages? Google for CRAN Task Views

Except for graphics package (a a base R package), other packages need to downloaded and installed into your R library.

Examples of other more specific packages - to run graphics for certain analyses - are:

  1. survminer::ggsurvlot

  2. sjPlot

1.0 Using Basic Plot Function

Import data into R environment

aiq <- airquality

Daily air quality measurements in New York, May to September 1973.

Format

A data frame with 153 observations on 6 variables.

[,1] Ozone numeric Ozone (ppb)
[,2] Solar.R numeric Solar R (lang)
[,3] Wind numeric Wind (mph)
[,4] Temp numeric Temperature (degrees F)
[,5] Month numeric Month (1–12)
[,6] Day numeric Day of month (1–31)

Details

Daily readings of the following air quality values for May 1, 1973 (a Tuesday) to September 30, 1973.

  • Ozone: Mean ozone in parts per billion from 1300 to 1500 hours at Roosevelt Island

  • Solar.R: Solar radiation in Langleys in the frequency band 4000–7700 Angstroms from 0800 to 1200 hours at Central Park

  • Wind: Average wind speed in miles per hour at 0700 and 1000 hours at LaGuardia Airport

  • Temp: Maximum daily temperature in degrees Fahrenheit at LaGuardia Airport.

Barplot

There are two types of bar plots- horizontal and vertical which represent data points as horizontal or vertical bars of certain lengths proportional to the value of the data item. They are generally used for continuous and categorical variable plotting. By setting the horiz parameter to true and false, we can get horizontal and vertical bar plots respectively. 

# Horizontal Bar Plot for 
# Ozone concentration in air
barplot(aiq$Ozone,
        main = 'Ozone Concenteration in air',
        xlab = 'Ozone Levels', horiz = TRUE)

# Vertical Bar Plot for 
# Ozone concentration in air
barplot(aiq$Ozone, main = 'Ozone Concenteration in air', 
        xlab = 'Ozone Levels', col ='blue', horiz = FALSE)

Bar plots are used for the following scenarios:

  • To perform a comparative study between the various data categories in the data set.

  • To analyze the change of a variable over time in months or years.

Histogram

A histogram is like a bar chart as it uses bars of varying height to represent data distribution. However, in a histogram values are grouped into consecutive intervals called bins. In a Histogram, continuous values are grouped and displayed in these bins whose size can be varied.

# Histogram for Maximum Daily Temperature

hist(aiq$Temp, main ="Maximum Temperature(Daily)",
    xlab ="Temperature(Fahrenheit)",
    xlim = c(50, 125), col ="yellow",
    freq = TRUE)

For a histogram, the parameter xlim can be used to specify the interval within which all values are to be displayed. 
Another parameter freq when set to TRUE denotes the frequency of the various values in the histogram and when set to FALSE, the probability densities are represented on the y-axis such that they are of the histogram adds up to one. 

Box Plot

The statistical summary of the given data is presented graphically using a boxplot. A boxplot depicts information like the minimum and maximum data point, the median value, first and third quartile, and interquartile range.

# Box plot for average wind speed
boxplot(aiq$Wind, main = "Average wind speed",
        xlab = "Miles per hour", ylab = "Wind",
        col = "orange", border = "black",
        horizontal = TRUE)

Multiple box plots can also be generated at once through the following code:

# Multiple Box plots, each representing
# an Air Quality Parameter
boxplot(aiq[, 1:4], 
        main ='Box Plots for Air Quality Parameters')

Scatter Plot

A scatter plot is composed of many points on a Cartesian plane. Each point denotes the value taken by two parameters and helps us easily identify the relationship between them.

# Scatter plot for Ozone Concentration per month

plot(aiq$Ozone, aiq$Temp,
     main ="Scatterplot Example",
    xlab ="Ozone Concentration in parts per billion",
    ylab =" Temperature ", pch = 19)

plot(aiq) # plot matrix

2.0 Using ggplot2 function

ggplot2 is a R package dedicated to data visualization. It can greatly improve the quality and aesthetics of your graphics, and will make you much more efficient in creating them.

To work with ggplot2, remember

  • start with: ggplot()

  • which data: data = X

  • which variables: aes(x = , y = )

  • which graph: geom_histogram(), geom_points()

The official website for ggplot2 is here http://ggplot2.org/.

Load Package

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(gcookbook)

That one line of code loads the core tidyverse; packages which you will use in almost every data analysis.

Data Frame

heightweight 
    sex ageYear ageMonth heightIn weightLb
1     f   11.92      143     56.3     85.0
2     f   12.92      155     62.3    105.0
3     f   12.75      153     63.3    108.0
4     f   13.42      161     59.0     92.0
5     f   15.92      191     62.5    112.5
6     f   14.25      171     62.5    112.0
7     f   15.42      185     59.0    104.0
8     f   11.83      142     56.5     69.0
9     f   13.33      160     62.0     94.5
10    f   11.67      140     53.8     68.5
11    f   11.58      139     61.5    104.0
12    f   14.83      178     61.5    103.5
13    f   13.08      157     64.5    123.5
14    f   12.42      149     58.3     93.0
15    f   11.92      143     51.3     50.5
16    f   12.08      145     58.8     89.0
17    f   15.92      191     65.3    107.0
18    f   12.50      150     59.5     78.5
19    f   12.25      147     61.3    115.0
20    f   15.00      180     63.3    114.0
21    f   11.75      141     61.8     85.0
22    f   11.67      140     53.5     81.0
23    f   13.67      164     58.0     83.5
24    f   14.67      176     61.3    112.0
25    f   15.42      185     63.3    101.0
26    f   13.83      166     61.5    103.5
27    f   14.58      175     60.8     93.5
28    f   15.00      180     59.0    112.0
29    f   17.50      210     65.5    140.0
30    f   12.17      146     56.3     83.5
31    f   14.17      170     64.3     90.0
32    f   13.50      162     58.0     84.0
33    f   12.42      149     64.3    110.5
34    f   11.58      139     57.5     96.0
35    f   15.50      186     57.8     95.0
36    f   16.42      197     61.5    121.0
37    f   14.08      169     62.3     99.5
38    f   14.75      177     61.8    142.5
39    f   15.42      185     65.3    118.0
40    f   15.17      182     58.3    104.5
41    f   14.42      173     62.8    102.5
42    f   13.83      166     59.3     89.5
43    f   14.00      168     61.5     95.0
44    f   14.08      169     62.0     98.5
45    f   12.50      150     61.3     94.0
46    f   15.33      184     62.3    108.0
47    f   11.58      139     52.8     63.5
48    f   12.25      147     59.8     84.5
49    f   12.00      144     59.5     93.5
50    f   14.75      177     61.3    112.0
51    f   14.83      178     63.5    148.5
52    f   16.42      197     64.8    112.0
53    f   12.17      146     60.0    109.0
54    f   12.08      145     59.0     91.5
55    f   12.25      147     55.8     75.0
56    f   12.08      145     57.8     84.0
57    f   12.92      155     61.3    107.0
58    f   13.92      167     62.3     92.5
59    f   15.25      183     64.3    109.5
60    f   11.92      143     55.5     84.0
61    f   15.25      183     64.5    102.5
62    f   15.42      185     60.0    106.0
63    f   12.33      148     56.3     77.0
64    f   12.25      147     58.3    111.5
65    f   12.83      154     60.0    114.0
66    f   13.00      156     54.5     75.0
67    f   12.00      144     55.8     73.5
68    f   12.83      154     62.8     93.5
69    f   12.67      152     60.5    105.0
70    f   15.92      191     63.3    113.5
71    f   15.83      190     66.8    140.0
72    f   11.67      140     60.0     77.0
73    f   12.33      148     60.5     84.5
74    f   15.75      189     64.3    113.5
75    f   11.92      143     58.3     77.5
76    f   14.83      178     66.5    117.5
77    f   13.67      164     65.3     98.0
78    f   13.08      157     60.5    112.0
79    f   12.25      147     59.5    101.0
80    f   12.33      148     59.0     95.0
81    f   14.75      177     61.3     81.0
82    f   14.25      171     61.5     91.0
83    f   14.33      172     64.8    142.0
84    f   15.83      190     56.8     98.5
85    f   15.25      183     66.5    112.0
86    f   11.92      143     61.5    116.5
87    f   14.92      179     63.0     98.5
88    f   15.50      186     57.0     83.5
89    f   15.17      182     65.5    133.0
90    f   15.17      182     62.0     91.5
91    f   11.83      142     56.0     72.5
92    f   13.75      165     61.3    106.5
93    f   13.75      165     55.5     67.0
94    f   12.83      154     61.0    122.5
95    f   12.50      150     54.5     74.0
96    f   12.92      155     66.0    144.5
97    f   13.58      163     56.5     84.0
98    f   11.75      141     56.0     72.5
99    f   12.25      147     51.5     64.0
100   f   17.50      210     62.0    116.0
101   f   14.25      171     63.0     84.0
102   f   13.92      167     61.0     93.5
103   f   15.17      182     64.0    111.5
104   f   12.00      144     61.0     92.0
105   f   16.08      193     59.8    115.0
106   f   11.75      141     61.3     85.0
107   f   13.67      164     63.3    108.0
108   f   15.50      186     63.5    108.0
109   f   14.08      169     61.5     85.0
110   f   14.58      175     60.3     86.0
111   f   15.00      180     61.3    110.5
112   m   13.75      165     64.8     98.0
113   m   13.08      157     60.5    105.0
114   m   12.00      144     57.3     76.5
115   m   12.50      150     59.5     84.0
116   m   12.50      150     60.8    128.0
117   m   11.58      139     60.5     87.0
118   m   15.75      189     67.0    128.0
119   m   15.25      183     64.8    111.0
120   m   12.25      147     50.5     79.0
121   m   12.17      146     57.5     90.0
122   m   13.33      160     60.5     84.0
123   m   13.00      156     61.8    112.0
124   m   14.42      173     61.3     93.0
125   m   12.58      151     66.3    117.0
126   m   11.75      141     53.3     84.0
127   m   12.50      150     59.0     99.5
128   m   13.67      164     57.8     95.0
129   m   12.75      153     60.0     84.0
130   m   17.17      206     68.3    134.0
132   m   14.67      176     63.8     98.5
133   m   14.67      176     65.0    118.5
134   m   11.67      140     59.5     94.5
135   m   15.42      185     66.0    105.0
136   m   15.00      180     61.8    104.0
137   m   12.17      146     57.3     83.0
138   m   15.25      183     66.0    105.5
139   m   11.67      140     56.5     84.0
140   m   12.58      151     58.3     86.0
141   m   12.58      151     61.0     81.0
142   m   12.00      144     62.8     94.0
143   m   13.33      160     59.3     78.5
144   m   14.83      178     67.3    119.5
145   m   16.08      193     66.3    133.0
146   m   13.50      162     64.5    119.0
147   m   13.67      164     60.5     95.0
148   m   15.50      186     66.0    112.0
149   m   11.92      143     57.5     75.0
150   m   14.58      175     64.0     92.0
151   m   14.58      175     68.0    112.0
152   m   14.58      175     63.5     98.5
153   m   14.42      173     69.0    112.5
154   m   14.17      170     63.8    112.5
155   m   14.50      174     66.0    108.0
156   m   13.67      164     63.5    108.0
157   m   12.00      144     59.5     88.0
158   m   13.00      156     66.3    106.0
159   m   12.42      149     57.0     92.0
160   m   12.00      144     60.0    117.5
161   m   12.25      147     57.0     84.0
162   m   15.67      188     67.3    112.0
163   m   14.08      169     62.0    100.0
164   m   14.33      172     65.0    112.0
165   m   12.50      150     59.5     84.0
166   m   16.08      193     67.8    127.5
167   m   13.08      157     58.0     80.5
168   m   14.00      168     60.0     93.5
169   m   11.67      140     58.5     86.5
170   m   13.00      156     58.3     92.5
171   m   13.00      156     61.5    108.5
172   m   13.17      158     65.0    121.0
173   m   15.33      184     66.5    112.0
174   m   13.00      156     68.5    114.0
175   m   12.00      144     57.0     84.0
176   m   14.67      176     61.5     81.0
177   m   14.00      168     66.5    111.5
178   m   12.42      149     52.5     81.0
179   m   11.83      142     55.0     70.0
180   m   15.67      188     71.0    140.0
181   m   16.92      203     66.5    117.0
182   m   11.83      142     58.8     84.0
183   m   15.75      189     66.3    112.0
184   m   15.67      188     65.8    150.5
185   m   16.67      200     71.0    147.0
186   m   12.67      152     59.5    105.0
187   m   14.50      174     69.8    119.5
188   m   13.83      166     62.5     84.0
189   m   12.08      145     56.5     91.0
190   m   11.92      143     57.5    101.0
191   m   13.58      163     65.3    117.5
192   m   13.83      166     67.3    121.0
193   m   15.17      182     67.0    133.0
194   m   14.42      173     66.0    112.0
195   m   12.92      155     61.8     91.5
196   m   13.50      162     60.0    105.0
197   m   14.75      177     63.0    111.0
198   m   14.75      177     60.5    112.0
199   m   14.58      175     65.5    114.0
200   m   13.83      166     62.0     91.0
201   m   12.50      150     59.0     98.0
202   m   12.50      150     61.8    118.0
203   m   15.67      188     63.3    115.5
204   m   13.58      163     66.0    112.0
205   m   14.25      171     61.8    112.0
206   m   13.50      162     63.0     91.0
207   m   11.75      141     57.5     85.0
208   m   14.50      174     63.0    112.0
209   m   11.83      142     56.0     87.5
210   m   12.33      148     60.5    118.0
211   m   11.67      140     56.8     83.5
212   m   13.33      160     64.0    116.0
213   m   12.00      144     60.0     89.0
214   m   17.17      206     69.5    171.5
215   m   13.25      159     63.3    112.0
216   m   12.42      149     56.3     72.0
217   m   16.08      193     72.0    150.0
218   m   16.17      194     65.3    134.5
219   m   12.67      152     60.8     97.0
220   m   12.17      146     55.0     71.5
221   m   11.58      139     55.0     73.5
222   m   15.50      186     66.5    112.0
223   m   13.42      161     56.8     75.0
224   m   12.75      153     64.8    128.0
225   m   16.33      196     64.5     98.0
226   m   13.67      164     58.0     84.0
227   m   13.25      159     62.8     99.0
228   m   14.83      178     63.8    112.0
229   m   12.75      153     57.8     79.5
230   m   12.92      155     57.3     80.5
231   m   14.83      178     63.5    102.5
232   m   11.83      142     55.0     76.0
233   m   13.67      164     66.5    112.0
234   m   15.75      189     65.0    114.0
235   m   13.67      164     61.5    140.0
236   m   13.92      167     62.0    107.5
237   m   12.58      151     59.3     87.0

2.1 Scatter Plot

Scatter plots are used to display the relationship between two continuous variables. In a scatter plot, each observation in a data set is represented by a point.

Basic Scatter Plot

heightweight %>%
  select(ageYear, heightIn)
    ageYear heightIn
1     11.92     56.3
2     12.92     62.3
3     12.75     63.3
4     13.42     59.0
5     15.92     62.5
6     14.25     62.5
7     15.42     59.0
8     11.83     56.5
9     13.33     62.0
10    11.67     53.8
11    11.58     61.5
12    14.83     61.5
13    13.08     64.5
14    12.42     58.3
15    11.92     51.3
16    12.08     58.8
17    15.92     65.3
18    12.50     59.5
19    12.25     61.3
20    15.00     63.3
21    11.75     61.8
22    11.67     53.5
23    13.67     58.0
24    14.67     61.3
25    15.42     63.3
26    13.83     61.5
27    14.58     60.8
28    15.00     59.0
29    17.50     65.5
30    12.17     56.3
31    14.17     64.3
32    13.50     58.0
33    12.42     64.3
34    11.58     57.5
35    15.50     57.8
36    16.42     61.5
37    14.08     62.3
38    14.75     61.8
39    15.42     65.3
40    15.17     58.3
41    14.42     62.8
42    13.83     59.3
43    14.00     61.5
44    14.08     62.0
45    12.50     61.3
46    15.33     62.3
47    11.58     52.8
48    12.25     59.8
49    12.00     59.5
50    14.75     61.3
51    14.83     63.5
52    16.42     64.8
53    12.17     60.0
54    12.08     59.0
55    12.25     55.8
56    12.08     57.8
57    12.92     61.3
58    13.92     62.3
59    15.25     64.3
60    11.92     55.5
61    15.25     64.5
62    15.42     60.0
63    12.33     56.3
64    12.25     58.3
65    12.83     60.0
66    13.00     54.5
67    12.00     55.8
68    12.83     62.8
69    12.67     60.5
70    15.92     63.3
71    15.83     66.8
72    11.67     60.0
73    12.33     60.5
74    15.75     64.3
75    11.92     58.3
76    14.83     66.5
77    13.67     65.3
78    13.08     60.5
79    12.25     59.5
80    12.33     59.0
81    14.75     61.3
82    14.25     61.5
83    14.33     64.8
84    15.83     56.8
85    15.25     66.5
86    11.92     61.5
87    14.92     63.0
88    15.50     57.0
89    15.17     65.5
90    15.17     62.0
91    11.83     56.0
92    13.75     61.3
93    13.75     55.5
94    12.83     61.0
95    12.50     54.5
96    12.92     66.0
97    13.58     56.5
98    11.75     56.0
99    12.25     51.5
100   17.50     62.0
101   14.25     63.0
102   13.92     61.0
103   15.17     64.0
104   12.00     61.0
105   16.08     59.8
106   11.75     61.3
107   13.67     63.3
108   15.50     63.5
109   14.08     61.5
110   14.58     60.3
111   15.00     61.3
112   13.75     64.8
113   13.08     60.5
114   12.00     57.3
115   12.50     59.5
116   12.50     60.8
117   11.58     60.5
118   15.75     67.0
119   15.25     64.8
120   12.25     50.5
121   12.17     57.5
122   13.33     60.5
123   13.00     61.8
124   14.42     61.3
125   12.58     66.3
126   11.75     53.3
127   12.50     59.0
128   13.67     57.8
129   12.75     60.0
130   17.17     68.3
132   14.67     63.8
133   14.67     65.0
134   11.67     59.5
135   15.42     66.0
136   15.00     61.8
137   12.17     57.3
138   15.25     66.0
139   11.67     56.5
140   12.58     58.3
141   12.58     61.0
142   12.00     62.8
143   13.33     59.3
144   14.83     67.3
145   16.08     66.3
146   13.50     64.5
147   13.67     60.5
148   15.50     66.0
149   11.92     57.5
150   14.58     64.0
151   14.58     68.0
152   14.58     63.5
153   14.42     69.0
154   14.17     63.8
155   14.50     66.0
156   13.67     63.5
157   12.00     59.5
158   13.00     66.3
159   12.42     57.0
160   12.00     60.0
161   12.25     57.0
162   15.67     67.3
163   14.08     62.0
164   14.33     65.0
165   12.50     59.5
166   16.08     67.8
167   13.08     58.0
168   14.00     60.0
169   11.67     58.5
170   13.00     58.3
171   13.00     61.5
172   13.17     65.0
173   15.33     66.5
174   13.00     68.5
175   12.00     57.0
176   14.67     61.5
177   14.00     66.5
178   12.42     52.5
179   11.83     55.0
180   15.67     71.0
181   16.92     66.5
182   11.83     58.8
183   15.75     66.3
184   15.67     65.8
185   16.67     71.0
186   12.67     59.5
187   14.50     69.8
188   13.83     62.5
189   12.08     56.5
190   11.92     57.5
191   13.58     65.3
192   13.83     67.3
193   15.17     67.0
194   14.42     66.0
195   12.92     61.8
196   13.50     60.0
197   14.75     63.0
198   14.75     60.5
199   14.58     65.5
200   13.83     62.0
201   12.50     59.0
202   12.50     61.8
203   15.67     63.3
204   13.58     66.0
205   14.25     61.8
206   13.50     63.0
207   11.75     57.5
208   14.50     63.0
209   11.83     56.0
210   12.33     60.5
211   11.67     56.8
212   13.33     64.0
213   12.00     60.0
214   17.17     69.5
215   13.25     63.3
216   12.42     56.3
217   16.08     72.0
218   16.17     65.3
219   12.67     60.8
220   12.17     55.0
221   11.58     55.0
222   15.50     66.5
223   13.42     56.8
224   12.75     64.8
225   16.33     64.5
226   13.67     58.0
227   13.25     62.8
228   14.83     63.8
229   12.75     57.8
230   12.92     57.3
231   14.83     63.5
232   11.83     55.0
233   13.67     66.5
234   15.75     65.0
235   13.67     61.5
236   13.92     62.0
237   12.58     59.3
ggplot(heightweight, aes(x = ageYear, y = heightIn)) +
  geom_point()

Use Different shape of the points

ggplot(heightweight, aes(x = ageYear, y = heightIn)) +
  geom_point(shape = 21)

Grouping Points Together using Shapes or Colors

You want to visually group points by some variable (the grouping variable), using different shapes or colors.

heightweight %>%
  select(sex, ageYear, heightIn)
    sex ageYear heightIn
1     f   11.92     56.3
2     f   12.92     62.3
3     f   12.75     63.3
4     f   13.42     59.0
5     f   15.92     62.5
6     f   14.25     62.5
7     f   15.42     59.0
8     f   11.83     56.5
9     f   13.33     62.0
10    f   11.67     53.8
11    f   11.58     61.5
12    f   14.83     61.5
13    f   13.08     64.5
14    f   12.42     58.3
15    f   11.92     51.3
16    f   12.08     58.8
17    f   15.92     65.3
18    f   12.50     59.5
19    f   12.25     61.3
20    f   15.00     63.3
21    f   11.75     61.8
22    f   11.67     53.5
23    f   13.67     58.0
24    f   14.67     61.3
25    f   15.42     63.3
26    f   13.83     61.5
27    f   14.58     60.8
28    f   15.00     59.0
29    f   17.50     65.5
30    f   12.17     56.3
31    f   14.17     64.3
32    f   13.50     58.0
33    f   12.42     64.3
34    f   11.58     57.5
35    f   15.50     57.8
36    f   16.42     61.5
37    f   14.08     62.3
38    f   14.75     61.8
39    f   15.42     65.3
40    f   15.17     58.3
41    f   14.42     62.8
42    f   13.83     59.3
43    f   14.00     61.5
44    f   14.08     62.0
45    f   12.50     61.3
46    f   15.33     62.3
47    f   11.58     52.8
48    f   12.25     59.8
49    f   12.00     59.5
50    f   14.75     61.3
51    f   14.83     63.5
52    f   16.42     64.8
53    f   12.17     60.0
54    f   12.08     59.0
55    f   12.25     55.8
56    f   12.08     57.8
57    f   12.92     61.3
58    f   13.92     62.3
59    f   15.25     64.3
60    f   11.92     55.5
61    f   15.25     64.5
62    f   15.42     60.0
63    f   12.33     56.3
64    f   12.25     58.3
65    f   12.83     60.0
66    f   13.00     54.5
67    f   12.00     55.8
68    f   12.83     62.8
69    f   12.67     60.5
70    f   15.92     63.3
71    f   15.83     66.8
72    f   11.67     60.0
73    f   12.33     60.5
74    f   15.75     64.3
75    f   11.92     58.3
76    f   14.83     66.5
77    f   13.67     65.3
78    f   13.08     60.5
79    f   12.25     59.5
80    f   12.33     59.0
81    f   14.75     61.3
82    f   14.25     61.5
83    f   14.33     64.8
84    f   15.83     56.8
85    f   15.25     66.5
86    f   11.92     61.5
87    f   14.92     63.0
88    f   15.50     57.0
89    f   15.17     65.5
90    f   15.17     62.0
91    f   11.83     56.0
92    f   13.75     61.3
93    f   13.75     55.5
94    f   12.83     61.0
95    f   12.50     54.5
96    f   12.92     66.0
97    f   13.58     56.5
98    f   11.75     56.0
99    f   12.25     51.5
100   f   17.50     62.0
101   f   14.25     63.0
102   f   13.92     61.0
103   f   15.17     64.0
104   f   12.00     61.0
105   f   16.08     59.8
106   f   11.75     61.3
107   f   13.67     63.3
108   f   15.50     63.5
109   f   14.08     61.5
110   f   14.58     60.3
111   f   15.00     61.3
112   m   13.75     64.8
113   m   13.08     60.5
114   m   12.00     57.3
115   m   12.50     59.5
116   m   12.50     60.8
117   m   11.58     60.5
118   m   15.75     67.0
119   m   15.25     64.8
120   m   12.25     50.5
121   m   12.17     57.5
122   m   13.33     60.5
123   m   13.00     61.8
124   m   14.42     61.3
125   m   12.58     66.3
126   m   11.75     53.3
127   m   12.50     59.0
128   m   13.67     57.8
129   m   12.75     60.0
130   m   17.17     68.3
132   m   14.67     63.8
133   m   14.67     65.0
134   m   11.67     59.5
135   m   15.42     66.0
136   m   15.00     61.8
137   m   12.17     57.3
138   m   15.25     66.0
139   m   11.67     56.5
140   m   12.58     58.3
141   m   12.58     61.0
142   m   12.00     62.8
143   m   13.33     59.3
144   m   14.83     67.3
145   m   16.08     66.3
146   m   13.50     64.5
147   m   13.67     60.5
148   m   15.50     66.0
149   m   11.92     57.5
150   m   14.58     64.0
151   m   14.58     68.0
152   m   14.58     63.5
153   m   14.42     69.0
154   m   14.17     63.8
155   m   14.50     66.0
156   m   13.67     63.5
157   m   12.00     59.5
158   m   13.00     66.3
159   m   12.42     57.0
160   m   12.00     60.0
161   m   12.25     57.0
162   m   15.67     67.3
163   m   14.08     62.0
164   m   14.33     65.0
165   m   12.50     59.5
166   m   16.08     67.8
167   m   13.08     58.0
168   m   14.00     60.0
169   m   11.67     58.5
170   m   13.00     58.3
171   m   13.00     61.5
172   m   13.17     65.0
173   m   15.33     66.5
174   m   13.00     68.5
175   m   12.00     57.0
176   m   14.67     61.5
177   m   14.00     66.5
178   m   12.42     52.5
179   m   11.83     55.0
180   m   15.67     71.0
181   m   16.92     66.5
182   m   11.83     58.8
183   m   15.75     66.3
184   m   15.67     65.8
185   m   16.67     71.0
186   m   12.67     59.5
187   m   14.50     69.8
188   m   13.83     62.5
189   m   12.08     56.5
190   m   11.92     57.5
191   m   13.58     65.3
192   m   13.83     67.3
193   m   15.17     67.0
194   m   14.42     66.0
195   m   12.92     61.8
196   m   13.50     60.0
197   m   14.75     63.0
198   m   14.75     60.5
199   m   14.58     65.5
200   m   13.83     62.0
201   m   12.50     59.0
202   m   12.50     61.8
203   m   15.67     63.3
204   m   13.58     66.0
205   m   14.25     61.8
206   m   13.50     63.0
207   m   11.75     57.5
208   m   14.50     63.0
209   m   11.83     56.0
210   m   12.33     60.5
211   m   11.67     56.8
212   m   13.33     64.0
213   m   12.00     60.0
214   m   17.17     69.5
215   m   13.25     63.3
216   m   12.42     56.3
217   m   16.08     72.0
218   m   16.17     65.3
219   m   12.67     60.8
220   m   12.17     55.0
221   m   11.58     55.0
222   m   15.50     66.5
223   m   13.42     56.8
224   m   12.75     64.8
225   m   16.33     64.5
226   m   13.67     58.0
227   m   13.25     62.8
228   m   14.83     63.8
229   m   12.75     57.8
230   m   12.92     57.3
231   m   14.83     63.5
232   m   11.83     55.0
233   m   13.67     66.5
234   m   15.75     65.0
235   m   13.67     61.5
236   m   13.92     62.0
237   m   12.58     59.3
ggplot(heightweight, aes(x = ageYear, y = heightIn, shape = sex,colour = sex)) +
  geom_point()

Apply facet_wrap for viewing the data according to categorical variable (Split your plot)

ggplot(heightweight, aes(x = ageYear, y = heightIn,colour = sex)) +
  geom_point() + 
  facet_wrap(~sex)

Mapping a Continuous Variable to Color or Size

A basic scatter plot shows the relationship between two continuous variables: one mapped to the x-axis, and one to the y-axis. When there are more than two continuous variables, these additional variables must be mapped to other aesthetics, like size and color.

heightweight %>%
  select(sex, ageYear, heightIn, weightLb)
    sex ageYear heightIn weightLb
1     f   11.92     56.3     85.0
2     f   12.92     62.3    105.0
3     f   12.75     63.3    108.0
4     f   13.42     59.0     92.0
5     f   15.92     62.5    112.5
6     f   14.25     62.5    112.0
7     f   15.42     59.0    104.0
8     f   11.83     56.5     69.0
9     f   13.33     62.0     94.5
10    f   11.67     53.8     68.5
11    f   11.58     61.5    104.0
12    f   14.83     61.5    103.5
13    f   13.08     64.5    123.5
14    f   12.42     58.3     93.0
15    f   11.92     51.3     50.5
16    f   12.08     58.8     89.0
17    f   15.92     65.3    107.0
18    f   12.50     59.5     78.5
19    f   12.25     61.3    115.0
20    f   15.00     63.3    114.0
21    f   11.75     61.8     85.0
22    f   11.67     53.5     81.0
23    f   13.67     58.0     83.5
24    f   14.67     61.3    112.0
25    f   15.42     63.3    101.0
26    f   13.83     61.5    103.5
27    f   14.58     60.8     93.5
28    f   15.00     59.0    112.0
29    f   17.50     65.5    140.0
30    f   12.17     56.3     83.5
31    f   14.17     64.3     90.0
32    f   13.50     58.0     84.0
33    f   12.42     64.3    110.5
34    f   11.58     57.5     96.0
35    f   15.50     57.8     95.0
36    f   16.42     61.5    121.0
37    f   14.08     62.3     99.5
38    f   14.75     61.8    142.5
39    f   15.42     65.3    118.0
40    f   15.17     58.3    104.5
41    f   14.42     62.8    102.5
42    f   13.83     59.3     89.5
43    f   14.00     61.5     95.0
44    f   14.08     62.0     98.5
45    f   12.50     61.3     94.0
46    f   15.33     62.3    108.0
47    f   11.58     52.8     63.5
48    f   12.25     59.8     84.5
49    f   12.00     59.5     93.5
50    f   14.75     61.3    112.0
51    f   14.83     63.5    148.5
52    f   16.42     64.8    112.0
53    f   12.17     60.0    109.0
54    f   12.08     59.0     91.5
55    f   12.25     55.8     75.0
56    f   12.08     57.8     84.0
57    f   12.92     61.3    107.0
58    f   13.92     62.3     92.5
59    f   15.25     64.3    109.5
60    f   11.92     55.5     84.0
61    f   15.25     64.5    102.5
62    f   15.42     60.0    106.0
63    f   12.33     56.3     77.0
64    f   12.25     58.3    111.5
65    f   12.83     60.0    114.0
66    f   13.00     54.5     75.0
67    f   12.00     55.8     73.5
68    f   12.83     62.8     93.5
69    f   12.67     60.5    105.0
70    f   15.92     63.3    113.5
71    f   15.83     66.8    140.0
72    f   11.67     60.0     77.0
73    f   12.33     60.5     84.5
74    f   15.75     64.3    113.5
75    f   11.92     58.3     77.5
76    f   14.83     66.5    117.5
77    f   13.67     65.3     98.0
78    f   13.08     60.5    112.0
79    f   12.25     59.5    101.0
80    f   12.33     59.0     95.0
81    f   14.75     61.3     81.0
82    f   14.25     61.5     91.0
83    f   14.33     64.8    142.0
84    f   15.83     56.8     98.5
85    f   15.25     66.5    112.0
86    f   11.92     61.5    116.5
87    f   14.92     63.0     98.5
88    f   15.50     57.0     83.5
89    f   15.17     65.5    133.0
90    f   15.17     62.0     91.5
91    f   11.83     56.0     72.5
92    f   13.75     61.3    106.5
93    f   13.75     55.5     67.0
94    f   12.83     61.0    122.5
95    f   12.50     54.5     74.0
96    f   12.92     66.0    144.5
97    f   13.58     56.5     84.0
98    f   11.75     56.0     72.5
99    f   12.25     51.5     64.0
100   f   17.50     62.0    116.0
101   f   14.25     63.0     84.0
102   f   13.92     61.0     93.5
103   f   15.17     64.0    111.5
104   f   12.00     61.0     92.0
105   f   16.08     59.8    115.0
106   f   11.75     61.3     85.0
107   f   13.67     63.3    108.0
108   f   15.50     63.5    108.0
109   f   14.08     61.5     85.0
110   f   14.58     60.3     86.0
111   f   15.00     61.3    110.5
112   m   13.75     64.8     98.0
113   m   13.08     60.5    105.0
114   m   12.00     57.3     76.5
115   m   12.50     59.5     84.0
116   m   12.50     60.8    128.0
117   m   11.58     60.5     87.0
118   m   15.75     67.0    128.0
119   m   15.25     64.8    111.0
120   m   12.25     50.5     79.0
121   m   12.17     57.5     90.0
122   m   13.33     60.5     84.0
123   m   13.00     61.8    112.0
124   m   14.42     61.3     93.0
125   m   12.58     66.3    117.0
126   m   11.75     53.3     84.0
127   m   12.50     59.0     99.5
128   m   13.67     57.8     95.0
129   m   12.75     60.0     84.0
130   m   17.17     68.3    134.0
132   m   14.67     63.8     98.5
133   m   14.67     65.0    118.5
134   m   11.67     59.5     94.5
135   m   15.42     66.0    105.0
136   m   15.00     61.8    104.0
137   m   12.17     57.3     83.0
138   m   15.25     66.0    105.5
139   m   11.67     56.5     84.0
140   m   12.58     58.3     86.0
141   m   12.58     61.0     81.0
142   m   12.00     62.8     94.0
143   m   13.33     59.3     78.5
144   m   14.83     67.3    119.5
145   m   16.08     66.3    133.0
146   m   13.50     64.5    119.0
147   m   13.67     60.5     95.0
148   m   15.50     66.0    112.0
149   m   11.92     57.5     75.0
150   m   14.58     64.0     92.0
151   m   14.58     68.0    112.0
152   m   14.58     63.5     98.5
153   m   14.42     69.0    112.5
154   m   14.17     63.8    112.5
155   m   14.50     66.0    108.0
156   m   13.67     63.5    108.0
157   m   12.00     59.5     88.0
158   m   13.00     66.3    106.0
159   m   12.42     57.0     92.0
160   m   12.00     60.0    117.5
161   m   12.25     57.0     84.0
162   m   15.67     67.3    112.0
163   m   14.08     62.0    100.0
164   m   14.33     65.0    112.0
165   m   12.50     59.5     84.0
166   m   16.08     67.8    127.5
167   m   13.08     58.0     80.5
168   m   14.00     60.0     93.5
169   m   11.67     58.5     86.5
170   m   13.00     58.3     92.5
171   m   13.00     61.5    108.5
172   m   13.17     65.0    121.0
173   m   15.33     66.5    112.0
174   m   13.00     68.5    114.0
175   m   12.00     57.0     84.0
176   m   14.67     61.5     81.0
177   m   14.00     66.5    111.5
178   m   12.42     52.5     81.0
179   m   11.83     55.0     70.0
180   m   15.67     71.0    140.0
181   m   16.92     66.5    117.0
182   m   11.83     58.8     84.0
183   m   15.75     66.3    112.0
184   m   15.67     65.8    150.5
185   m   16.67     71.0    147.0
186   m   12.67     59.5    105.0
187   m   14.50     69.8    119.5
188   m   13.83     62.5     84.0
189   m   12.08     56.5     91.0
190   m   11.92     57.5    101.0
191   m   13.58     65.3    117.5
192   m   13.83     67.3    121.0
193   m   15.17     67.0    133.0
194   m   14.42     66.0    112.0
195   m   12.92     61.8     91.5
196   m   13.50     60.0    105.0
197   m   14.75     63.0    111.0
198   m   14.75     60.5    112.0
199   m   14.58     65.5    114.0
200   m   13.83     62.0     91.0
201   m   12.50     59.0     98.0
202   m   12.50     61.8    118.0
203   m   15.67     63.3    115.5
204   m   13.58     66.0    112.0
205   m   14.25     61.8    112.0
206   m   13.50     63.0     91.0
207   m   11.75     57.5     85.0
208   m   14.50     63.0    112.0
209   m   11.83     56.0     87.5
210   m   12.33     60.5    118.0
211   m   11.67     56.8     83.5
212   m   13.33     64.0    116.0
213   m   12.00     60.0     89.0
214   m   17.17     69.5    171.5
215   m   13.25     63.3    112.0
216   m   12.42     56.3     72.0
217   m   16.08     72.0    150.0
218   m   16.17     65.3    134.5
219   m   12.67     60.8     97.0
220   m   12.17     55.0     71.5
221   m   11.58     55.0     73.5
222   m   15.50     66.5    112.0
223   m   13.42     56.8     75.0
224   m   12.75     64.8    128.0
225   m   16.33     64.5     98.0
226   m   13.67     58.0     84.0
227   m   13.25     62.8     99.0
228   m   14.83     63.8    112.0
229   m   12.75     57.8     79.5
230   m   12.92     57.3     80.5
231   m   14.83     63.5    102.5
232   m   11.83     55.0     76.0
233   m   13.67     66.5    112.0
234   m   15.75     65.0    114.0
235   m   13.67     61.5    140.0
236   m   13.92     62.0    107.5
237   m   12.58     59.3     87.0
ggplot(heightweight, aes(x = ageYear, y = heightIn, colour = weightLb)) +
  geom_point()

Adding Fitted Lines

You want to add lines from a fitted regression model to a scatter plot.

# We'll use the heightweight data set and create a base plot called `hw_sp` (for heighweight scatter plot)
hw_sp <- ggplot(heightweight, aes(x = ageYear, y = heightIn))

hw_sp +
  geom_point() +
  stat_smooth(method = lm, se = FALSE)
`geom_smooth()` using formula = 'y ~ x'

# 99% confidence region
hw_sp +
  geom_point() +
  stat_smooth(method = lm, level = 0.95)
`geom_smooth()` using formula = 'y ~ x'

Customizing labels and title

hw_sp +
  geom_point() +
  stat_smooth(method = lm, se = FALSE) + 
  labs(x = "Age",
       y = "Height",
    title = "Age vs Height") +
  theme_bw()
`geom_smooth()` using formula = 'y ~ x'

2.2 Line Graph

Basic Line Graph

ggplot(BOD, aes(x = Time, y = demand)) +
  geom_line()

Adding Points to a Line Graph

ggplot(BOD, aes(x = Time, y = demand)) +
  geom_line() +
  geom_point()

Changing the Appearance of Lines

ggplot(BOD, aes(x = Time, y = demand)) +
  geom_line(linetype = "dashed", size = 1, colour = "blue")
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.

Making a Line Graph with Multiple Lines

In addition to the variables mapped to the x- and y-axes, map another (discrete) variable to colour or linetype

tg
  supp dose length
1   OJ  0.5  13.23
2   OJ  1.0  22.70
3   OJ  2.0  26.06
4   VC  0.5   7.98
5   VC  1.0  16.77
6   VC  2.0  26.14
ggplot(tg, aes(x = dose, y = length, colour = supp)) +
  geom_line()

2.3 Bar Graphs

Bar graphs are perhaps the most commonly used kind of data visualization. They’re typically used to display numeric values (on the y-axis), for different categories (on the x-axis).

Basic Bar Graph

pg_mean
  group weight
1  ctrl  5.032
2  trt1  4.661
3  trt2  5.526
ggplot(pg_mean, aes(x = group, y = weight)) +
  geom_col()

Add colors to the bar

ggplot(pg_mean, aes(x = group, y = weight)) +
  geom_col(fill = "purple", colour = "black")

Adjusting Bar Width and Spacing

Narrower Bar

ggplot(pg_mean, aes(x = group, y = weight)) +
  geom_col(width = 0.5)

wider width

ggplot(pg_mean, aes(x = group, y = weight)) +
  geom_col(width = 0.8)

Grouping Bars Together

In this example we’ll use the cabbage_exp data set, which has two categorical variables, Cultivar and Date, and one continuous variable, Weight:

cabbage_exp
  Cultivar Date Weight        sd  n         se
1      c39  d16   3.18 0.9566144 10 0.30250803
2      c39  d20   2.80 0.2788867 10 0.08819171
3      c39  d21   2.74 0.9834181 10 0.31098410
4      c52  d16   2.26 0.4452215 10 0.14079141
5      c52  d20   3.11 0.7908505 10 0.25008887
6      c52  d21   1.47 0.2110819 10 0.06674995
ggplot(cabbage_exp, aes(x = Date, y = Weight, fill = Cultivar)) +
  geom_col(position = "dodge")

Making a Stacked Bar Graph

ggplot(cabbage_exp, aes(x = Date, y = Weight, fill = Cultivar)) +
  geom_col()

Making a Proportional Stacked Bar Graph

ggplot(cabbage_exp, aes(x = Date, y = Weight, fill = Cultivar)) +
  geom_col(position = "fill")

Adding Labels to a Bar Graph

ggplot(cabbage_exp, aes(x = Date, y = Weight, fill = Cultivar)) +
  geom_col(position = "dodge") +
  geom_text(
    aes(label = Weight),
    colour = "black", size = 3,
    vjust = 1.5, position = position_dodge(0.9)
  ) +
   labs(x = "Date",
       y = "Weight",
    title = "Grouped Bars with Labels") 

2.4 Summarized Data Distributions

Histogram

library(MASS)

Attaching package: 'MASS'
The following object is masked from 'package:dplyr':

    select
ggplot(birthwt, aes(x = bwt)) +
  geom_histogram(fill = "purple", colour = "black")
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Making Multiple Histograms from Grouped Data

birthwt_mod <- birthwt
# Convert smoke to a factor and reassign new names
birthwt_mod$smoke <- recode_factor(birthwt_mod$smoke, '0' = 'No Smoke', '1' = 'Smoke')

ggplot(birthwt_mod, aes(x = bwt)) +
  geom_histogram(fill = "purple", colour = "black") +
  facet_grid(smoke ~ .)
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Making a Basic Box Plot

ggplot(birthwt, aes(x = factor(race), y = bwt)) +
  geom_boxplot()

Making a Density Plot of Two-Dimensional Data

# Save a base plot object
faithful_p <- ggplot(faithful, aes(x = eruptions, y = waiting))

faithful_p +
  geom_point() +
  stat_density2d()

# Contour lines, with "height" mapped to color
faithful_p +
  stat_density2d(aes(colour = ..level..))
Warning: The dot-dot notation (`..level..`) was deprecated in ggplot2 3.4.0.
ℹ Please use `after_stat(level)` instead.

# Map density estimate to fill color
faithful_p +
  stat_density2d(aes(fill = ..density..), geom = "raster", contour = FALSE)

2.5 Visualization and Relationship

view(mpg)

A data frame with 234 rows and 11 variables:

manufacturer

manufacturer name

model

model name

displ

engine displacement, in litres

year

year of manufacture

cyl

number of cylinders

trans

type of transmission

drv

the type of drive train, where f = front-wheel drive, r = rear wheel drive, 4 = 4wd

cty

city miles per gallon

hwy

highway miles per gallon

fl

fuel type

class

“type” of car

Does engine size had relationship with efficiency?

mpg %>% 
  ggplot(aes(displ,cty)) +
  geom_point()

View the plot according to some category

mpg %>% 
  ggplot(aes(displ,cty)) +
  geom_point(aes(colour = drv))

add trend line to the plot

mpg %>% 
  ggplot(aes(displ,cty)) +
  geom_point(aes(colour = trans)) +
  geom_smooth()
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Linearize the relationship

mpg %>% 
  ggplot(aes(displ,cty)) +
  geom_point(aes(colour = trans)) +
  geom_smooth(method = lm)
`geom_smooth()` using formula = 'y ~ x'

mpg %>% 
  ggplot(aes(displ,cty)) +
  geom_point(aes(colour = trans)) +
   geom_smooth(method = lm) +
  facet_wrap(~drv)
`geom_smooth()` using formula = 'y ~ x'

Customizing labels and title

mpg %>% 
  ggplot(aes(displ,cty)) +
  geom_point(aes(colour = trans)) +
   geom_smooth(method = lm) +
  facet_wrap(~drv) +
  labs(x = "engine size" ,
       y = "City per Gallon",
       title = "Fuel Efficiency") +
  theme_bw()
`geom_smooth()` using formula = 'y ~ x'

Saving your plot

mpg %>% 
  ggplot(aes(displ,cty)) +
  geom_point(aes(colour = trans)) +
   geom_smooth(method = lm) +
  facet_wrap(~drv) +
  labs(x = "engine size" ,
       y = "City per Gallon",
       title = "Fuel Efficiency") +
  theme_bw() 
`geom_smooth()` using formula = 'y ~ x'

ggsave('mpg.pdf')
Saving 7 x 5 in image
`geom_smooth()` using formula = 'y ~ x'

Other task : Animated chart

# Charge libraries
library(gganimate)
Warning: package 'gganimate' was built under R version 4.4.1
p <- ggplot(mpg, aes(displ, cty,colour = trans)) +
  geom_point() +
  scale_x_log10() +
  theme_bw() +
  labs(title = 'Year: {frame_time}', x = 'Engine Size', y = 'City Miles per Gallon') +
  transition_time(year) +
  ease_aes('linear')

# Render the animation and save it as a GIF
anim <- animate(p)
anim_save("gif_chart.gif", animation = anim)

3.0 Task

  1. Get data o your interest.

  2. Make 5 or 6 plots to tell us about your data

  3. Customize your data - make change to colour, title, axes etc

Further Reading